Discovering Demographic Language Variation

نویسندگان

  • Brendan O’Connor
  • Jacob Eisenstein
  • Eric P. Xing
  • Noah A. Smith
چکیده

We propose a Bayesian generative model of how demographic social factors influence lexical choice. We apply the method to a corpus of geo-tagged Twitter messages originating from mobile phones, cross-referenced against U.S. Census demographic data. Our method discovers communities jointly defined by linguistic and demographic properties.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring Language Variation Across Europe - A Web-based Tool for Computational Sociolinguistics

Language varies not only between countries, but also along regional and socio-demographic lines. This variation is one of the driving factors behind language change. However, investigating language variation is a complex undertaking: the more factors we want to consider, the more data we need. Traditional qualitative methods are not well-suited to do this, and therefore restricted to isolated f...

متن کامل

Discovering Stylistic Variations in Distributional Vector Space Models via Lexical Paraphrases

Detecting and analyzing stylistic variation in language is relevant to diverse Natural Language Processing applications. In this work, we investigate whether salient dimensions of style variations are embedded in standard distributional vector spaces of word meaning. We hypothesize that distances between embeddings of lexical paraphrases can help isolate style from meaning variations and help i...

متن کامل

Generative Typology

This article lays out an approach that combines a formal-generative perspective on language, including tolerance of abstract analyses, with a typological focus on comparing unrelated languages from around the world. It argues that this can be a powerful combination for discovering linguistic universals and patterns in linguistic variation that are not detected by other means.

متن کامل

Late Talkers: D O Good Predictors Oy Outcome Exist?

Both small-scale and epidemiological longitudinal studies of early language delay indicate that most late talkers attain language scores in the average range by age 5, 5, or 7. However, late talker groups typically obtain significantly lower scores than groups with typical language histories on most language measures into adolescence. These findings support a dimensional account of language del...

متن کامل

Language and Variation: A Study of English and Persian Wh-questions

It was claimed by variationists that languages experience variation at all levels, which is supposed to be patterned. The present study aimed at exploring how variation occurred in English and Persian wh-questions. More specifically, it investigated whether such a variation was systematic and patterned. To this end, a modified version of the Edinburgh Map Task was used in data collection. The p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010